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1. INTRODUCTION 

Increased environment contamination is one of the most urgent problems we are facing these days. 
Especially, the industrialization of China makes air quality worse and worse. A great deal of air pollutants 
come from burning fossil fuels to obtain electricity [1]. Renewable energy, such as wind and sunlight, is the 
most promising solution to this problem, as they can generate energy without greenhouse gas emissions. 
However, its intermittent nature prevents itself from being seamlessly integrated into the current energy grid 
or entirely replacing legacy energy generation mechanisms. Indispensably, the renewable energy integration 
needs electricity reserve units to cope with the time disparity between energy generation and consumption. 
Their efficient management is the key not only to blend more renewable energy in our power systems but 
also to reduce the cost of excessive reserves [2]. 

A grid can make an energy generation plan according to the forecast on how much renewable 
energy will be available on the next day or the next few hours in addition to the traditional demand forecast 
[3], [4]. Generally, the prediction of energy availability can be done based on historical statistics or on 
relevant spatial and temporal parameters [5]. In the example of solar energy, irradiance will be the most 
important entity. As for history data analysis, most modern renewable energy generators are able to capture 
their operation status to report to a central manager or store for further analysis [6]. Those datasets allow us 
to conduct diverse analysis to better understand the operation of facilities and make a prediction model. 
Particularly, solar energy generation is deeply dependent on climate conditions. Hence, we can enhance the 
accuracy of prediction models by the integration of diverse data streams. Here, the prediction model will be 
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different region by region, making it is necessary to select a best modeling scheme appropriate to the target 
region and dataset [7]. 

This paper begins with collecting data streams from multiple solar panels in a microgrid, 
specifically, Jeju National University, Republic of Korea. Their operation behaviors are traced to develop a 
prediction model of the amount of solar energy generation for the next hour or day. This approach takes open 
software solutions for data management, analysis, and visualization. The data stream is stored in MySQL 
database and a series of queries are designed and issued to this database table. Additionally, the query results, 
significantly cut down in size, are given to the R statistics package to invoke advanced machine learning 
APIs and elaborate visualization tools [8]. Specifically, the query results are loaded to the R space either 
directly via the RMySQL library or by the import command towards the text file downloaded from the 
MySQL machine. In addition, for the sake of applying a more efficient machine learning library, namely, 
FANN (Fast Artificial Neural Network), query results are cooked to learning patterns specified by FANN [9]. 

The rest of this paper is organized as follows: After outlining the paper in Section 1, Section 2 
describes the data acquisition process. Section 3 extensively investigates the observation parameters and 
discusses the result. Finally, Section 4 concludes this paper with a brief introduction of future work. 


2. DATA ACQUISITION 

Figure 1 shows our data acquisition process. After getting an endorsement from the facility 
management office, our research team obtains the operation records of 7 solar panels over the period from 
January 2016 to March 2017. The archives are given as Microsoft Excel files. We implement a data parser by 
Python, which provides comprehensive data interfaces for Excel, Jason, XML, and many others. The parser 
reads each station record and field one by one to create a series of SQL insert statements. Now, the SQL 
statements are uploaded to the MySQL machine and executed to insert each record sequentially. In this 
process, we define a database table containing timestamps and the current amount of energy generation at 7 
places. Any queries can be issued to this table via the R package or in the command line interpreter. It must 
be mentioned that some fields has been corrupted and their values get out of the valid range, that is, the 
power generation capacity. Those fields will be simply nullified in the database, as missing value 
interpolation is not our concern [11]. 
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Figure 1. Data acquisition 


3. DATA ANALYSIS 

We name each solar panel facility from Loc1 to Loc7, and they are located over the university area. 
Their power capacities are 30, 40, 30, 30, 30, 30, 90, and 60 kw, respectively. Among these, the last three 
began their operations last December, January, and March, respectively. For them, the amount of electricity 
generation during the non-working period will be null. Figure 2 plots the valid record ratio since 2016 
January. For the first 4 places, ratios are almost 1.0, while the invalidity comes from the intermittent 
malfunction of the acquisition equipment. On the contrary, for the last 3 places, valid record ratios are quite 
low, as they have been working just for a few months. Anyway, Figure 2(a) shows the data characteristics in 
our system. Additionally, Figure 2(b) shows the monthly number of valid records. The number of records 
becomes nonzero only after a facility begins its operation. The management system has undergone system 
upgrade, failure remedy, and safety investigation. The number of valid records is most affected by this 
common factor, while the device-level malfunction is not significant. 
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Figure 2. Validity ratio statistics 


Next, Figure 3 shows the average amount of energy generation for each facility. The value is 
obtained by averaging the generation amount for each record. Actually, during the night time, solar energy 
generation cannot take place at all. Hence, the human operator manually turns off the monitoring system 
from time to time. Those time intervals are excluded in calculating the average. According to Figure 3(a), 
Loc6, having the generation capacity of 90 kw, shows the highest average. We can see that those places 
having the same capacity show almost the same generation amount. This result indicates that equipment-level 
difference is quite negligible. In addition, Figure 3(b) shows the monthly average of each facility in a 
university-scale microgrid. Here again, the monthly behaviors are almost equal for the facilities having the 
same capacity. The maximum difference is observed to be 7.1% in April 2016. We can see the same pattern 
in all of the time series. 
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Figure 3. Average amount of energy generation 


Figure 4 traces the maximum generation amount for all facilities during the whole operation periods. 
As expected, the maximum amount is limited by the power capacity for each facility. According to 
Figure 4(a), each generator approaches its full capacity by from 90.0 to 99.7%. The natural worn-out of 
equipment will worsen this ratio. It will give a guideline on when to replace the equipment. In those days 
having the best conditions for solar energy generation, that is, high-insolation days, each facility reaches its 
maximum. In addition, Figure 4(b) shows the monthly maximum generation. In August 2016, the monitoring 
system has been shut down for some component exchanges. Even during the operation time, records have 
been hardly valid. Hence, as in the case of Figure 3(b), Figure 4(b) also has a hole in this month. 
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Figure 4. Maximum amount of energy generation 


Figure 5 plots the total amount of electricity generation for each facility. As each record contains 
just the snapshot value at a specific time instant, the exact amount can be different. However, as the solar 
energy generation does not change sharply in a single day, the add-up of each snapshot value provides 
sufficiently accurate estimation. During the whole investigation period, the accumulated amount is almost 
linear to the full capacity of a generator, as shown in Figure 5(a). The last 3 have smaller amount in all-time 
generation, as they begun working recently. Figure 5(b) shows the monthly amount of energy generated at 
each solar panel. 7 panels show the similar curves, but the panel having larger capacity seems to drop more 
sharply. That is, there exist those days in which the energy can be generated between 30 an 90kw. 
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Figure 5. Total amount of energy generation 


Now, Figure 6 shows the day-by-day amount of genrration along the whole investigation period for 
4 panels working from the beginning. We can see the period in which the monitoring system has stopped 
working for 45 days. Even though the figure traces the daily total generation, the average and maximum 
amounts also show the same pattern. The daily difference definitely comes from different climate conditions, 
particularly, insolation and cloudiness. According to our observation, the solar energy generation lasts at 
most 11 hours a day. For only a few hours around 1 PM, the generation amount approaches the full capacity. 
For Loc2 having the full capacity of 40kw shows 290kwh on the day of best climate condition. With 4 panels, 
the microgrid obtains up to about 800kwh at maximum in a day. It’s not so much, compared with the total 
daily consumption in the university, namely, about 50Mwh. However, as more panels (Loc5, Loc6, and 
Loc7) are installed, the coverage of renewable energy will increase more than double. Up to now, 
intermittency does not matter, as the solar energy generation takes place during the hour of hot consumption 
in the university and can be entirely consumed. 
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Figure 6. Daily generation pattern 


Finally, Figure 7 and Figure 8 build trace and prediction models for Locl and Loc2, whose full 
generation capacities are 30 and 40kw, respectively. As the graph will look so dense if we plot from January 
2016, these graphs show from 2016-08-08, when the monitoring system restarted after the system upgrade. 
The prediction model is built exploiting the FANN (Fast Artificial Neural Network) library, which provides 
comprehensive APIs regarding ANN-based machine learning [9]. Based on the principle of learn by example, 
the ANN model consists of input, hidden, and output layers as well as the links between them [10]. We think 
as the climate changes not so instantaneously and the current weather is correlated with previous ones, the 
generation amount will behave much likely. The sequence of daily generation is converted to a set of learning 
records. The modeling process takes the generation amounts for the 4 previous days as inputs and the current 


day generation as output of the neural network. The number of nodes in the hidden layer is empirically 
selected to be 30. 
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Figure 7. Trace and prediction model for Loc1 (30kwh capacity) 


Each of Figure 7 and Figure 8 includes a vertical bar on 01-01-17. Daily generations from this day 
are not used in the learning phase. As can be seen in those figures, the fitting error is quite small and comes 
from the time lagging effect. The average error is larger in Loc2 having higher capacity. On the contrary, in 
the prediction part shown in the right-hand side of the vertical bar, the difference between the two curves gets 
quite severe from time to time. The maximum errors reach 147kw and 265kw on Loc1 and Loc2, mainly due 
to time lagging in tracing the changing pattern. The average errors for two places are 26kw and 47kw, which 
correspond to 27.6 and 29.6%, respectively. On the first day of a pattern change, the error size gets higher, 
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for example, the model predicts the maximum amount of generation but the actual generation is the 
minimum, or vice versa. Anyway, we will integrate the weather forecast to this model and has found the 
temporal dependency in the data stream of solar energy generation. 
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Figure 8. Trace and prediction model for Loc! (40 kwh capacity) 


4. CONCLUSION 

In this paper, we have described out data acquisition process from the management system of solar 
energy generations. For the hourly generator outputs on 7 panels during the period from January 2016 to 
March 2017, the average, maximum, and total amounts of generated electricity are analyzed on all-time, 
monthly, and daily basis. An ANN-based prediction model, built upon the sequential record set, shows 
average prediction error of 27.6%, mainly due to time lagging, making it necessary to integrate more 
machine learning process and other parameters. 

Currently, we are conducting an analysis on the behavior of EV (Electric Vehicle) chargers, aiming 
at integrating more renewable energy for EV charging [12]. By combining solar energy generation and EV 
charging demand, it will be possible to shift the charging demand across the group of chargers belonging to a 
microgrid having solar energy plants [13], [14]. 
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